12 research outputs found

    A Three Step Blind Approach for Improving HPC Systems' Energy Performance

    Get PDF
    International audienceNowadays, there is no doubt that energy consumption has become a limiting factor in the design and operation of high performance computing (HPC) systems. This is evidenced by the rise of efforts both from the academia and the industry to reduce the energy consumption of those systems. Unlike hardware solutions, software initiatives targeting HPC systems' energy consumption reduction despite their effectiveness are often limited for reasons including: (i) the program specific nature of the solution proposed; (ii) the need of deep understanding of applications at hand; (iii) proposed solutions are often difficult to use by novices and/or are designed for single task environments. This paper propose a three step blind system-wide, application independent, fine-grain, and easy to use (user friendly) methodology for improving energy performance of HPC systems. The methodology typically breaks into phase detection, phase characterization, and phase identification and system reconfiguration. And it is blind in the sense that it does not require any knowledge from users. It relies upon reconfigurable capabilities offered by the majority of HPC subsystems -- including the processor, storage, memory, and communication subsystems -- to reduce the overall energy consumption of the system (excluding network equipments) at runtime. We also present an implementation of our methodology through which we demonstrate its effectiveness via static analyses and experiments using benchmarks representative of HPC workloads

    On the accuracy and usefulness of analytic energy models for contemporary multicore processors

    Full text link
    This paper presents refinements to the execution-cache-memory performance model and a previously published power model for multicore processors. The combination of both enables a very accurate prediction of performance and energy consumption of contemporary multicore processors as a function of relevant parameters such as number of active cores as well as core and Uncore frequencies. Model validation is performed on the Sandy Bridge-EP and Broadwell-EP microarchitectures. Production-related variations in chip quality are demonstrated through a statistical analysis of the fit parameters obtained on one hundred Broadwell-EP CPUs of the same model. Insights from the models are used to explain the performance- and energy-related behavior of the processors for scalable as well as saturating (i.e., memory-bound) codes. In the process we demonstrate the models' capability to identify optimal operating points with respect to highest performance, lowest energy-to-solution, and lowest energy-delay product and identify a set of best practices for energy-efficient execution

    DVFS-control techniques for dense linear algebra operations on multi-core processors

    Full text link
    [EN] This paper analyzes the impact on power con- sumption of two DVFS-control strategies when applied to the execution of dense linear algebra operations on multi- core processors. The strategies considered here, prototyped as the Slack Reduction Algorithm (SRA) and the Race-to- Idle Algorithm (RIA), adjust the operation frequency of the cores during execution of a collection of tasks (in which many dense linear algebra algorithms can be decomposed) with a very different approach to save energy. A power- aware simulator, in charge of scheduling the execution of tasks to processor cores, is employed to evaluate the perfor- mance benefits of these power-control policies for two ref- erence algorithms for the LU factorization, a key operation for the solution of linear systems of equations.The authors from Univ. Jaume I were supported by project CICYT TIN2008-06570-C04 and FEDER.Alonso-Jordá, P.; Dolz Zaragozá, MF.; Igual, FD.; Mayo, R.; Quintana Ortí, ES. (2012). DVFS-control techniques for dense linear algebra operations on multi-core processors. Computer Science - Research and Development. 27(4):289-298. https://doi.org/10.1007/s00450-011-0188-7S289298274Albers S (2010) Energy-efficient algorithms. Commun ACM 53:86–96Dongarra J et al. (2011) The international ExaScale software project roadmap. Int J High Perform Comput Appl, 25(1):3–60Duranton M et al. (2010) The HiPEAC vision. Available from http://www.hipeac.net/roadmapFeng W, Feng X, Ce R (2008) Green supercomputing comes of age. IT Prof 10(1):17–23Gruber R, Keller V (2010) One joule per GFlop for BLAS2 now! In: Simos TE, Psihoyios G, Tsitouras C (eds) AIP conf proceedings, vol 1281. American Institute of Physics, College Park, pp 1321–1324Ludwig T (2010) Editorial for the first international conference on energy-aware high performance computing. Comput Sci Res Dev 25(3):123–124Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, BaltimoreVan Zee FG (2009) libflame: the complete reference. www.lulu.comAnderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz Du J, Hammarling S, Greenbaum A, McKenney A, Sorensen D (1999) LAPACK users’ guide, 3rd edn. SIAM, PhiladelphiaHsu C, Feng W (2005) A feasibility analysis of power awareness in commodity-based high-performance clusters. In: Cluster 2005Quintana-Ortí ES, van de Geijn RA (2008) Updating an LU factorization with pivoting. ACM Trans Math Softw 35(2):11:1–11:16Quintana-Ortí G, Quintana-Ortí ES, van de Geijn RA, Van Zee FG, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:26Freeh VW, Lowenthal DK, Pan F, Kappiah N, Springer R, Rountree BL, Femal ME (2007) Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans Parallel Distrib Syst 18:835–848King D, Ahmad I, Sheikh HF (2010) Stretch and compress based re-scheduling techniques for minimizing the execution times of DAGs on multi-core processors under energy constraints. In: International conference on green computing. IEEE Press, New York, pp 49–60Palli K (2005) Scheduling dags for minimum finish time and power consumption on heterogeneous processors. Master’s thesis, Albers University, Albers, ALShaffer LR, Ritter JB, Meyer WL (1965) The critical-path method. McGraw-Hill, New YorkAlonso P, Dolz MF, Mayo R, Quintana-Ortí ES (2011) Improving power efficiency of dense linear algorithms on multi-core processors via slack control. Proceedings of the 2011 international conference on high performance computing & simulation (HPCS 2011). IEE Catzlog Number. CFP1178H-CDR, pp. 463–470Alonso P, Dolz MF, Mayo R, Quintana-Ortí ES (2011) Energy-aware scheduling of dense linear algebra operations on multi-core processors. Technical report 2011-04-01, Depto. de Ingeniería y Ciencia de los Computadores, Universitat Jaume I, April 2011Li R, Huang HC (2007) List scheduling for jobs with arbitrary release times and similar lengths. J Sched 10(6):365–373Mtibaa A, Ouni B, Abid M (2007) An efficient list scheduling algorithm for time placement problem. Comput Electr Eng 33(4):285–29
    corecore